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REMARKS 

The Applicants respectfully request further examination and reconsideration in view of 
the comments set forth fully below. Claims 1-51 were pending. Within the Office Action, 
Claims 1-51 have been rejected. Accordingly, Claims 1-51 are now pending in the application. 

Objections to the Claims: 

Within the Office Action, Claim 41 has been objected to because of informalities. 
Specifically, it is stated that punctuation is missing from the claim. The Applicants respectfully 
submit that Claim 41 is properly punctuated. Accordingly, the objection to the claim should be 
withdrawn. 

Re jections Under 35 U.S.C. § 103: 

Within the Office Action, Claims 1, 2, 7, 8, 1 1, 13-16, 41, 44, 50 and 51 have been 
rejected under 35 U.S.C. § 103(a) as being unpatentable over A Bit-Serial VLSI Array 
Processing Chip for Image Processing, IEEE Journal of Solid-State Circuits, Vol. 25, No. 2, 
April 1990 to Heaton et al. (hereinafter "Heaton"). The Applicants respectfully disagree. 

Heaton teaches an array processing chip which integrates many processing elements on a 
single die. Each processing element has several components including a 16-function logical unit, 
an adder, a shift register and local RAM. [Heaton, Abstract] The shift register is used as a local 
accumulator to hold arithmetic operands. [Heaton, page 366, ]f2] Heaton also teaches: 

An OR tree is connected to all PE's on the chip, enabling the values 
presented on each of the 128 PE data buses to be ORed together. This feature 
enables the user to quickly test for a "true" bit in any of the PE's of the array. The 
OR tree is useful in associative operations and in performing data searches. OR 
tree operations are pipelined. The single OR pin output is open drain, enabling 
several BLITZEN chips to be directly wire ORed together. [Heaton, page 367, |2] 

However, Heaton does not teach a global accumulation unit to accumulate the results of the 
processing operations for each processing element. 

In contrast to Heaton, the present invention is directed to a video platform architecture for 
video processing includes complex video compression/decompression algorithms in a computer 
with a two-dimensional Single-Instruction Multiple-Data (SIMD) array architecture. The video 
platform architecture includes one or more video processing modules, on-chip shared memory, 
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and a general-purpose RISC central processing unit CPU used as a system controller. Each video 
processing module includes a rectangular array of processing elements (PEs), a block load/store 
unit and a global-accumulation unit. Video to be processed is configured into blocks of data and 
a general-purpose CPU used as a local controller. A plurality of registers are provided in the 
processing elements and the block load/store unit to support two-dimensional processing of the 
data blocks. Types of registers used include block registers, vector registers, scalar registers, and 
exchange registers. Each of these registers is designed to hold a short ordered one- or two- 
dimensional set of video data (data blocks). These registers are arranged in a hierarchical 
configuration along the data flow path between the on-chip memory and processing units within 
the PE array. [Present Specification, Abstract] 

Furthermore, in some embodiments, the global accumulation unit includes 4 slice 
accumulation (SACC) registers, 1 global PE mask control register, and 1 global accumulation 
(GACC) register. In some embodiments, there is one SACC register for each vertical PE slice of 
the PE array. The SACC registers are the intermediate registers in the operations moving data 
from the LACC register of each PE to the GACC register. In some embodiments, there are 4 40- 
bit SACC registers in the global accumulation unit. Each of the SACC registers includes three 
individually written sections, namely low 16-bits, middle 16-bits, and high 8-bits. Each PE's 40- 
bit LACC is read in steps, specifying which part of the LACC, low 16-bits, middle 16-bits, or 
high 8-bits, is to be placed on the 16-bit bus to the global accumulation unit, and finally into 
corresponding section of the appropriate SACC register. During operation of the global 
accumulation unit, either the full 40-bit values or packed 20-bit values of the SACC register 
involved in the accumulation operations are added together by a global add instruction and a 
global add and accumulate instruction. The GACC register is used to perform global 
accumulation of LACC values from multiple PEs loaded into the corresponding SACC registers. 
In some embodiments, there is one 48-bit GACC register in the global accumulation unit. 
[Present Specification, page 15, line 24 through page 16, line 8] As described above, Heaton does 
not teach a global accumulation unit to accumulate the results of the processing operations for 
each processing element. 

It is asserted within the Response to Arguments section of the Office Action that since 
Heaton teaches local accumulation, it would have been obvious and is fairly suggested by the 
reference to store results for each processing element in one central location (global 
accumulation unit). The Applicants respectfully disagree with this assertion. The Examiner's 
assertion is based on improper hindsight reasoning. Heaton does not suggest, teach or disclose a 
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global accumulation unit as claimed. Instead, Heaton teaches a local accumulation unit that is a 
two operand adder. Heaton also teaches an OR tree. However, a global accumulation unit, as 
suggested by the Examiner, would then require an additional adder and a shift register, similar to 
those used in Heaton's local accumulation unit, as described on page 366, |2 and Figure 6. Such 
a global accumulation unit, however, would not be able to accumulate the results of the 
precessing operations for each processing element, as taught in the present specification; rather, 
this global accumulation unit would only add two results together. Further, Heaton's OR tree is 
connected to all PE's on the chip, enabling the values presented on each of the 128 data buses to 
be ORed together. OR tree operations are pipelined. As such, the OR tree performs a logical OR 
operation, not an accumulation operation. Accordingly, the Applicants respectfully submit that 
Heaton does not suggest, teach or disclose a global accumulation unit as claimed. 

The independent Claim 1 is directed to a video processing apparatus. The video 
processing apparatus of Claim 1 comprises a memory and one or more video processing 
modules, each video processing module coupled to the memory and comprising a programmable 
array of processing elements, each processing element including local registers to provide data 
used in processing operations and to store results of the processing operations, a block load and 
store unit coupled to the programmable array of processing elements to load, store, and send data 
transferred back and forth between the memory and the array of processing elements, a global 
accumulation unit to accumulate the results of the processing operations for each processing 
element and a local controller to provide instructions and parameters related to the processing 
operations and data transfer. As described above, Heaton does not teach or make obvious a 
global accumulation unit to accumulate the results of the processing operations for each 
processing element. For at least these reasons, the independent Claim 1 is allowable over the 
teachings of Heaton. 

Claims 2, 7, 8, 1 1 and 13-16 are all dependent upon the independent Claim 1. As 
described above, the independent Claim 1 is allowable over the teachings of Heaton. 
Accordingly, Claims 2, 7, 8, 1 1 and 13-16 are all also allowable as being dependent upon an 
allowable base claim. 

The independent Claim 41 is directed to a programmable array of processing elements to 
process video, each processing element comprising local registers to store video data blocks 
received from a main memory, to process the received video data blocks, and to store results of 
processing the video data blocks, wherein each processing element is configured to send the 
results to a global accumulation unit to accumulate the results of the processing operations for 
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each processing element. As described above, Heaton does not teach or make obvious wherein 
each processing element is configured to send the results to a global accumulation unit to 
accumulate the results of the processing operations for each processing element. For at least 
these reasons, the independent Claim 41 is allowable over the teachings of Heaton. 

Claims 44, 50 and 51 are all dependent upon the independent Claim 41. As described 
above, the independent Claim 41 is allowable over the teachings of Heaton. Accordingly, Claims 
44, 50 and 51 are all also allowable as being dependent upon an allowable base claim. 

Within the Office Action, Claims 3, 4, 9, 10, 42, 43, 45 and 46 have been rejected under 
35 U.S.C. § 103(a) as being unpatentable over Heaton in view of U.S. Patent No. 4,992,933 to 
Taylor (hereinafter "Taylor"). The Applicants respectfully disagree. 

Claims 3, 4, 9 and 10 are all dependent upon the independent Claim 1. Claims 42, 43, 45 
and 46 are all dependent upon the independent Claim 4 1 . As described above, the independent 
Claims 1 and 41 are allowable over the teachings of Heaton. Accordingly, Claims 3, 4, 9, 10, 42, 
43, 45 and 46 are all also allowable as being dependent upon an allowable base claim. 

Within the Office Action, Claims 12 and 49 have been rejected under 35 U.S.C. § 103(a) 
as being unpatentable over Heaton in view of U.S. Patent No. 4,745,547 to Buchholz (hereinafter 
"Buchholz"). The Applicants respectfully disagree. 

Claim 12 is dependent upon the independent Claim 1. Claim 49 is dependent upon the 
independent Claim 41. As described above, the independent Claims 1 and 41 are allowable over 
the teachings of Heaton. Accordingly, Claims 12 and 49 are both also allowable as being 
dependent upon an allowable base claim. 

Within the Office Action, Claims 5, 6, 15, 47 and 48 have been rejected under 35 U.S.C. 
§ 103(a) as being unpatentable over Heaton in view of U.S. Patent No. 5,680,338 to Agarwal et 
al. (hereinafter "Agarwal"). The Applicants respectfully disagree. 

Claims 5, 6 and 15 are all dependent upon the independent Claim 1 . Claims 47 and 48 
are dependent upon the independent Claim 41. As described above, the independent Claims 1 
and 41 are allowable over the teachings of Heaton. Accordingly, Claims 5, 6, 15, 47 and 48 are 
all also allowable as being dependent upon an allowable base claim. 

Within the Office Action, Claims 17-26, 28-38 and 40 have been rejected under 35 
U.S.C. § 103(a) as being unpatentable over Heaton in view of Taylor and further in view of 
Agarwal. The Applicants respectfully disagree. 

As described above, Heaton does not teach a global accumulation unit to accumulate the 
results of the processing operations for each processing element. In addition, as recognized 
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within the Office Action, Heaton does not teach moving results into a local controller or loading 
data blocks from the array of block registers to the array of vector registers. Taylor and Agarwal 
are apparently cited for these reasons. However, Taylor and Agarwal also do not teach a global 
accumulation unit to accumulate the results of the processing operations for each processing 
element. Thus, the combination of Heaton, Taylor and Agarwal does not teach a global 
accumulation unit to accumulate the results of the processing operations for each processing 
element. 

The independent Claim 17 is directed to a method of processing video. The method of 
Claim 17 comprises configuring a video stream into data blocks, loading data blocks from 
memory to a first array of exchange registers, loading data blocks from the first array of exchange 
registers to a programmable array of processing elements, wherein each processing element 
within the array of processing elements includes an array of block registers, an array of vector 
registers, and a local accumulator, the data blocks are loaded from the first array of exchange 
registers to the array of block registers, loading the data blocks from the array of block registers 
to the array of vector registers, processing the data blocks loaded in the array of vector registers 
and storing results in the corresponding local accumulator for each processing element, 
accumulating the results stored in the local accumulators in a global accumulator, thereby 
forming accumulated results and moving the accumulated results into a local controller. As 
described above, neither Heaton, Taylor, Agarwal nor their combination teach accumulating the 
results stored in the local accumulators in a global accumulator, thereby forming accumulated 
results. For at least these reasons, the independent Claim 17 is allowable over the teachings of 
Heaton, Taylor, Agarwal and their combination. 

Claims 18-26 and 28 are all dependent upon the independent Claim 17. As described 
above, the independent Claim 17 is allowable over the teachings of Heaton, Taylor, Agarwal and 
their combination. Accordingly, Claims 18-26 and 28 are all also allowable as being dependent 
upon an allowable base claim. 

The independent Claim 29 is directed to a video processing apparatus. The video 
processing apparatus of Claim 29 comprises means for configuring a video stream into data 
blocks, means for loading data blocks from memory to a first array of exchange registers, the 
means for loading data blocks from memory coupled to the means for configuring, means for 
loading data blocks from the first array of exchange registers to a programmable array of 
processing elements, the means for loading data blocks from the first array of exchange registers 
coupled to the means for loading data blocks from memory, wherein each processing element 
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within the array of processing elements includes an array of block registers and an array of vector 
registers, the data blocks are loaded from the first array of exchange registers to the array of 
block registers, means for loading the data blocks from the array of block registers to the array of 
vector registers, the means for loading the data blocks from the array of block registers coupled 
to the means for loading data blocks from the first array of exchange registers, means for 
processing the data blocks loaded in the array of vector registers and storing results in the 
corresponding local accumulator for each processing element, the means for processing coupled 
to the means for loading the data blocks from the array of block registers, means for 
accumulating the results stored in the local accumulators in a global accumulator, thereby 
forming accumulated results, the means for accumulating coupled to the means for processing 
and means for moving the accumulated results into a local controller, the means for moving 
coupled to the means for accumulating. As described above, neither Heaton, Taylor, Agarwal 
nor their combination teach means for accumulating the results stored in the local accumulators 
in a global accumulator, thereby forming accumulated results. For at least these reasons, the 
independent Claim 29 is allowable over the teachings of Heaton, Taylor, Agarwal and their 
combination. 

Claims 30-38 and 40 are all dependent upon the independent Claim 29. As described 
above, the independent Claim 29 is allowable over the teachings of Heaton, Taylor, Agarwal and 
their combination. Accordingly, Claims 30-38 and 40 are all also allowable as being dependent 
upon an allowable base claim. 

Within the Office Action, Claims 27 and 39 have been rejected under 35 U.S.C. § 103(a) 
as being unpatentable over Heaton in view of Taylor in view of Agarwal and further in view of 
Buchholz. The Applicants respectfully disagree. 

Claim 27 is dependent upon the independent Claim 17. Claim 39 is dependent upon the 
independent Claim 29. As described above, the independent Claims 17 and 39 are allowable 
over the teachings of Heaton, Taylor, Agarwal and their combination. Accordingly, Claims 27 
and 39 are both also allowable as being dependent upon an allowable base claim. 
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Applicants respectfully submit that the claims are in a condition for allowance, and 
allowance at an early date would be appreciated. Should the Examiner have any questions or 
comments, they are encouraged to call the undersigned at (408) 530-9700 to discuss the same so 
that any outstanding issues can be expeditiously resolved. 

Respectfully submitted, 
HAVERSTOCK & OWENS LLP 



Dated: September 9. 2008 By: /Jonathan O. Owens/ 

Jonathan O. Owens 
Reg. No.: 37,902 
Attorneys for Applicants 
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